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Abstract:  In  this  paper,  we  treat  constrained  optimization  of  nonlinear  systems.  A 
rigorous  solution  method  to  obtain  nearly  optimal  state  feedback  control  that  takes  into 
consideration,  actuator  saturation,  state  space  constraints,  and  minimum-time  control 
requirement  is  presented.  The  constraints  are  encoded  into  the  optimization  formulation 
through  special  nonquadrtatic  functionals.  The  associated  Hamiton-Jacobi-Bellman 
(HJB)  equation  is  then  solved  successively.  Nonlinear  approximating  networks  are  used 
to  obtain  an  approximate  closed  form  solution  of  the  value  function  of  the  HJB  equation, 
which  is  then  used  to  obtain  a  state  feedback  controller.  The  solution  is  carried  over  a 
compact  set  of  the  asymptotic  stability  region  of  an  initial  stabilizing  control.  Copyright 
02002  IF  AC 

Keywords:  Actuator  saturation;  Constraints;  Minimum-time  control;  Neural  network; 
Optimal  control. 


1.  INTRODUCTION1 

Most  control  systems  are  required  to  work  under 
various  types  of  constraints  and  performance 
requirements.  Many  of  these  constraints  can  be 
classified  into  two  main  categories.  Constraints  due 
to  physical  limitations  on  the  control  input  to  the 
plant.  A  common  phenomenon  that  falls  in  this 
category  is  actuator  saturation.  Another  type  of 
constraints  is  due  to  physical  limitations  of  the  plant 
dynamics  itself.  Examples  of  these  are  constraints  on 
the  states  of  the  dynamical  system  itself.  An 
interesting  class  of  .optimization  problems  that  arise 
when  having  the  actuator  saturation  phenomena  is  to 
the  minimum-time  control  problem.  Therefore, 
developing  control  laws  that  are  optimized  for 
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various  combinations  of  system  and  control 
constraints  and  performance  requirements  comes  at 
the  heart  of  control  theory. 

The  control  of  systems  with  saturating  actuators  has 
been  the  focus  of  many  researchers  for  many  years. 
Several  methods  for  deriving  control  laws 
considering  the  saturation  phenomena  are  found  in 
(Saberi,  et  al .,  1996;  Sussmann,  et  al .,  1994).  Other 
methods  that  deal  with  constraints  on  the  states  of  the 
system  as  well  as  the  control  inputs  are  found  in 
(Bitsoris  and  Gravalou,  1999;  Henrion,  et  al 2001; 
Gilbert  and  Tan,  1991).  They  are  based  on 
mathematical  programming  and  the  set  invariance 
theory  in  some  cases.  The  focus  in  these  papers  is 
towards  finding  stabilizing  controllers  that  are  not 
necessarily  in  state  feedback  form,  and  without 
considering  general  optimization  issues. 


Bernstein  (1995)  studied  the  performance 
optimization  of  saturated  actuatois.  However,  in  the 
work  by  Lyshevski  (1995),  (1996),  (1998),  (2001a), 
(2001b),  a  general  framework  for  finding  optimal 
state  feedback  control  is  formulated.  He  proposes  the 
use  of  nonquadratic  performance  functionals  that 
enables  encoding  various  kinds  of  constraints  on  die 
control  system.  These  performance  functionals  are 
used  along  with  the  famous  HJB  equation  that 
appears  in  optimal  control  theory  (Lewis  and  Syrmos, 
1995).  It  is  this  framework  of  study  that  will  be  the 
focus  of  this  paper. 

For  linear  system  with  quadratic  performance 
functionals,  the  HJB  equation  becomes  the  algebraic 
Riccati  equation  which  is  easy  to  solve.  However,, 
when  the  performance  functionals  are  nonquadratic, 
or  if  the  dynamics  of  the  system  are  nonlinear,  the 
resulting  HJB  equation  is  highly  ronlinear,  and  is 
difficult  to  solve. 

Approximate  HJB  solution  has  been  confronted  using 
many  techniques  by  Saridis  and  Lee  (1979),  Beard 
(1995),  Beard,  et  al  (1997),  (1998),  Murray,  et  al 
(2001),  Bertsekas  and  Tsitsiklis  (1995),  Munos,  et  ai. 
(1999),  Kim  and  Lewis  (2000),  Han  and 
Balakrishnan  (2000),  Liu  and  Balakrishnan  (2000), 
Huang  and  Lin  (1995),  and  others. 

We  are  interested  in  closed  form  solution  of  the  HJB 
equation  which  leads  to  a  state  feedback  control.  An 
interesting  solution  method  of  the  HJB  equation  that 
results  in  closed  form  solution  is  developed  by  Beard 
(1995).  His  solution  method  is  based  on  a  Galerkin 
approximation  of  a  set  of  equations,  Generalized  HJB 
(GHJB),  that  appear  in  the  successive  approximation 
theory  developed  by  Saridis  and  Lee  (1979).  The 
successive  approximation  method  improves  a  given 
initial  stabilizing  control.  It  reduces  to  the  well- 
known  Kleinman  (1968)  iterative  method  for  solving 
the  algebraic  Riccati  equation  for  linear  systems. 

In  this  paper,  we  focus  on  solving  the  HJB  equation 
using  the  successive  approximation  theory.  Since  it  is 
developed  for  the  case  of  quadratic  performance 
functional  on  the  control  input,  we  start  first  by 
extending  the  successive  approximation  theory  to  the 
case  of  nonquadratic  functionals.  Then  we  use  the 
method  weighted  residuals  to  get  a  neural  network 
least  squares  solutions  to  the  set  of  successive 
equations  towards  obtaining  an  approximate  solution 
of  the  HJB  equation  with  nonquadratic  performance 
functionals. 

Neural  networks  have  been  used  to  control  nonlinear 
systems,  see  (Chen  and  Liu,  1994;  Lewis,  et  aL  1999; 
Polycarpou,  1996;  Rovithakis  and  Christodoulou, 
1994;  Sadegh,  1993;  Sanner  and  Slotine,  1991).  It  has 
been  shown  that  they  can  effectively  extend  adaptive 
control  techniques  to  nonlineariy  parameterized 


systems.  The  status  of  neural  network  control  as  of 
2001  appears  in(Narendra  and  Lewis,  2001). 

in  (Miller,  et  al 1990),  Werbos  first  proposed  using 
neural  networks  to  find  optimal  control  laws  using 
the  HJB  equation.  Parisini  (1998)  used  neural 
networks  to  derive  optimal  control  laws  for  discrete¬ 
time  stochastic  nonlinear  system.  Therefore,  in  this 
paper  we  employ  neural  networks  to  find  a  nearly 
optimal  solution  to  the  HJB  equation  for  constrained 
control  systems.  A  preliminary  report  of  this  work 
appears  in  (Abu-Khalaf  and  Lewis,  2002). 


2.  OPTIMAL  CONTROL  AND  THE 
GENERALIZED  HAMJLTON-JACOBT 
BELLMAN  EQUATION  (GHJB) 

Consider  an  affine  in  the  control  nonlinear  dynamical 
system  of  the  form 

x=f(x)+  g(x)u(x)  (1) 

where  xeST,  « :  9T  ->  9t" ,  /:5J i" ->9T, 

g  :9T  ->  91"  x9l" .  Assume  that  f+gu  is  Lipschitz 

continuous  on  a  set  ft  in  31"  containing  the  origin, 
and  that  the  system  (1)  is  controllable  in  the  sense 
that  there  exists  a  continuous  control  on  ft  that 
asymptotically  stabilizes  the  system. 

It  is  desired  to  find  a  control  function  u :  91* 
which  minimizes  a  generalized  nonquadratic 

functional 

V  =  ][Q(x)  +  W(u)}dt  (2) 

0 

where  Q(x )  is  positive  definite  monotonically 

increasing  function  on  ft,  and  thus  satisfies  the 
observability  condition.  jv(u)  is  a  positive  definite 

integrand  function.  For  unconstrained  control  inputs, 
a  common  choice  for  JV(it)  is 

W{u)~utRu  (3) 

where  R  e  St^xST.  Note  that  the  control  U  must 
not  only  stabilize  the  system  on  ft,  but  also  make  the 
integral  finite.  Such  controls  are  defined  to  be 
admissible  (Beard,  1995). 

Definition  2.1:  Admissible  Controls 

Let  ¥(ft)  denote  the  set  of  admissible  controls.  A 

control  u  :5JT  — is  defined  to  be  admissible  with 
respect  to  the  state  penalty  function  Q(X)  on  ft, 
denoted  u  e  ^(ft) ,  if: 

•  u  is  continuous  on  ft, 

•  i/(0)  =  0  , 

•  u  stabilizes  (1)  on  ft, 

•  J \0( X)  +  W{u)\lt  <»>  VreQ 

ll 

■ 


Differentiating  K,  the  value  function,  along  the 
system  trajectories,  we  obtain  what  is  known  as  the 
GHJB  equation, 

GHJ%vyu)~ 

[f+gu)+Q+uTRu  =  0,  (4) 

OX 

V  (0)  =  0. 


Note  that  the  GHJB  equation  becomes  the  well- 
known  HJB  equation  on  substitution  of  the  optimal 
control 


“’(*)  =  “T  R~'  £  (*)— T  —  (5) 

2  dx 

where  y\x)  is  the  unique  optimal  solution  to  the 
Hamiiton-Jacobi-Bellman  (HJB)  equation 


dv'T  ,  „ 

IT  /+a  <a« 

K*(0)  =  0. 


dx 


(6) 


It  is  shown  in  (Lyshevski  and  Meyer,  1 995)  that  the 
value  function  obtained  from  (6)  serves  as  a 
Lyapunov  function  on  Q .  It  is  important  to  note  that 
the  GHJB  is  linear  in  the  value  function  derivative, 
while  the  HJB  is  nonlinear  in  the  value  function 
derivative.  Solving  the  GHJB  requires  solving  linear 
partial  differential  equations,  while  the  HJB  equation 
solution  involves  nonlinear  partial  differential 
equations,  which  may  be  impossible  to  solve.  This  is 
the  reason  for  introducing  the  successive 
approximation  technique  using  GHJB,  which  was 
based  on  a  sound  proof  in  (Saridis  and  Lee  ,1979).  In 
the  successive  approximation  method,  one  solves  (4) 
for  V(x)  given  a  stabilizing  control  u(*),  then  finds 
an  improved  control  based  on  y(x)  using 


l  o-tjr  IV 


U=~R~  8 


dx 


(7) 


Saridis  and  Lee  (1979)  show  that  if  the  initial  control 
ul0)  e  ¥(£}) ,  then  repetitive  application  of  (4),  (7)  is 
a  contraction  map,  and  that  the  sequence  of  solutions 
V(i)  converges  to  the  optimal  HJB  solution  V\x)- 
This  of  course  assumes  that  someone  can  solve 
exactly  for  (4)  at  each  step. 

Although  the  GHJB  equation  is  in  theory  easier  to 
solve  than  the  HJB  equation,  there  is  no  general 
closed-form  solution  available  to  this  equation.  Beard 
(1995)  used  Galerkin’s  spectral  method  to  get  an 
approximate  solution  V{x)  in  (4)  at  each  iteration. 

He  proves  convergence  in  the  overall  run.  See 
(Beard,  et  al 1997,  1998)  for  complete  convergence 
proofs. 

The  Galerkin  spectral  method  does  not  set  the  GHJB 
equation  to  zero  at  each  iteration,  but  to  a  residual 


error  instead.  It  works  by  placing  the  solution  K(.x) 

to  the  differential  equation  in  a  Hilbert  space,  and 
restricting  it  to  a  compact  set  of  the  stability  region  of 
a  known  initial  stabilizing  control.  It  is  assumed  that 


y(x)>  where  :Q  cr,  and  (p.(0)  =  0  to  satisfy  the 
boundary  condition  F(0)  =  0*  it  is  also  assumed  that 
there  exist  coefficients  c  such  that 

j 


asL->~  (8) 

where  *,(*)• 


These  coefficients  are  found  by  setting  the  projection 
of  the  GHJB  equation  on  the  finite  basis 


c p  =  |^(^)j"  ^  to  zero  V;re  Q ;  i.e. 


/ 

GHJB 


>i 


)  =  0,  L 


where  the  inner  product  is  defined  as 
(/.g)s  ]fb)g(x)dx- 


(9) 

(10) 


After  the  coefficients  c.  are  found,  the  improved 


control  law  becomes, 

1  dV 

ul(x)  =  -±R~Y(x)^{x) 

2  dx 

=  -V'gr(.t)V<t>lQ 


(ID 


In  (Beard,  et  al,  1997),  it  is  shown  that  the  GHJB 
converges  pointwise  and  uniformly  on  Q  as  L-^oo. 
Moreover,  (Beard,  et  al,  1998)  shows  that  3 L  such 
that  the  successive  approximation  theory  of  (Saridis 
and  Lee  ,1979)  holds  and  works  with  approximate 
solutions. 


The  Galerkin  approximation  technique  requires  the 
evaluation  of  numerous  integrals.  Moreover,  n  its 
current  format,  the  successive  approximation 
algorithm  is  unable  to  deal  with  constrained 
optimization. 


3.  SUCCESSIVE  APPROXIMATION  OF  HJB 
EQUATION  WITH  CONSTRAINTS 
ON  THE  CONTROL  SYSTEM 

To  be  able  to  encode  constraints  into  the  HJB 
equation,  complicated  nonquadratic  performance 
functionals  are  required.  Moreover,  the  successive 
approximation  theory  must  hold  for  the  nonquadratic 
performance  functional  used  so  that  it  can  be 
employed. 

In  this  section,  we  will  consider  three  cases  which 
require  three  types  of  nonquadratic  performance 
functionals. 


3. 1  Actuator  Saturation 


improve  the  control  law.  It  will  be  required  that  the 
control  bound  ^(.)  is.monotonically  non-decreasing. 


To  confront  bounds  on  control  inputs  of  the  system,  a 
suitable  generalized  nonquadratic  functional  is 

tV(u)  =  2j[r\n))T  Rdn,  <12) 

0 

where  0Q  9T*  is  a  continuous  one-to-one, 
bounded,  real-analytic  integrable  function  of  class 
Cp  (p>\)  with  <p( 0)  =  0,e.g.  tanhQ-  R  is 

positive  definite  and  assumed  to  be  symmetric  for 
simplicity  of  analysis.  This  does  not  restrict  the 
design  criteria  on  the  control  input  vector,  because 
the  number  of  coefficients  that  we  can  choose 
independently  in  the  symmetric  design  matrix  R  is 
equal  to  the  number  of  quadratic  terms  possible  from 
the  control  input  vector.  These  two  numbers  are 

equal,  that  is  mLzll+m=m+^n  j.  Note  that  W(u) 

is  positive  definite  if  has  the  same  sign  as  u 

and  R  is  positive  definite.  Also  this  functional  brings 
the  control  signal  just  to  the  level  of  saturation  of  the 
actuator,  and  allows  the  control  signal  itself  to 
saturate.  Saturation  allowance  control  techniques 
results  in  nonlinear  closed-loop  dynamics,  compared 
to  saturation  avoidance  techniques,  (Henrion,  et  ai ., 
2001). 

For  saturated  controls,  GHJB  design  equations  (4), 
(7)  are  replaced  with 

(f+g  u)+Q+2f[<P~'(u)f  Rdu=0,  (13) 
K(0)  =  0. 

«(*)= (14) 

Note  that  equation  (14)  guarantees  that  u(x)  is 
bounded. 

If  we  substitute  (14)  into  (13)  we  obtain  the  HJB 
equation  for  bounded  controls.  A  positive  definite 
solution  of  this  equation  is  the  stabilizing  value 
function  and  its  corresponding  optimal  control. 
Existence  and  uniqueness  of  the  value  function  has 
been  shown  in  by  Lyshevski  (1996),  This  HJB 
equation  cannot  generally  be  solved.  There  is  no 
current  method  for  rigorously  confronting  this  type  of 
equation  to  find  the  value  function  of  the  system. 
Moreover,  current  solutions  are  not  well  defined  over 
a  specific  region  in  the  state  space. 

Successive  approximation  using  the  GHJB  has  not 
yet  been  rigorously  applied  for  saturated  controls.  We 
will  show  that  the  successive  approximation 
technique  can  be  used  for  constrained  controls  when 
certain  restrictions  on  the  control  input  are  met.  The 
successive  approximation  technique  is  now  applied  to 
the  new  set  of  equations  (13),  (14).  The  following 
Lemma  shows  how  equation  (14)  can  be  used  to 


Lemma  3.1:  Improved  Saturated  Control  Law 

uine  ¥(&)>  and  V(t)  satisfies  the  equation 
GHJB[Vli),ii{l))  =  0  with  the  boundary  condition 
K(n(0)  =  0  >  then  the  new  control  derived  as 

(15) 

dx  j 

is  an  admissible  control  for  the  system  on  D. 
Moreover,  if  the  control  bound  0(.)  is  monotonically 

non-decreasing  and  F0+,)  is  the  unique  positive 
definite  function  satisfying  the  equation 
GHJB(ViM\uiM))  =  0.  with  the  boundary  condition 
K{*u(0)  =0  ,  then  Kw,(i)  <  Q  . 

Proof: 

Admissibility:  Since  V<J)  is  continuously 

differentiable,  the  continuity  assumption  on  g 

implies  that  «{W)  is  continuous.  Since  Vin  is  positive 
definite  it  attains  a  minimum  at  the  origin,  and  thus, 

must  vanish.  This  implies  that  i/(;+,)(0)=  0- 


«(/+l)W=-0  I  \R-'g{x) 


Taking  the  derivative  of  Vin  along  the  system 
(fg  ,w(‘+I))  trajectory  we  have, 

y<n(X,uiM' )=  dlp»T  /+  dyfx->T  g  uiM>.  (i6) 


3x 


But, 


dx  J  dx  8  y('}  (17) 

2  J  /?  du 


This  becomes 

yu'(x.u{h")=-~'"(-x)  g  uu'  +  dy{"(x>T  g 
dx  dx 

-Q(x)-2  j  (</>-' (u))  R  du. 


(18) 


Making  use  of  the  fact  that 
d~fX)  g(x)=-2<j>-'  (utM')T  R<  we  get 
£«>(x,u'M)J  =  -Q(x)+ 

uli» 

r'  (w1'"")  r  (W<" j  (r'(u)f  r  du 


(19) 


The  second  term  in  the  previous  equation  is  negative 
when  (p~l  and  thus  ^  is  monotonically  non¬ 
decreasing.  To  see  this,  note  that  the  design  matrix  R 
is  symmetric  positive  definite,  this  means  we  can 
rewrite  it  as 

R=\ZA  (20) 


where  2  is  a  triangular  matrix  with  its  values  being 
the  singular  values  of  R  and  A  is  an  orthogonal 
symmetric  matrix.  Substituting  (20)  in  (19)  we  get, 

F'Vx«M)=-2W+ 


2 


f'  [u'Mf A2A  (u<n-u,M1)~ 

u‘"  T 

/  (0“'(h))  A2A  rfu 


(21) 


Applying  the  coordinate  change  u=A~'z,  equation 
(21)  then  becomes 


yv>(X'UW)  =  -Q(x)  + 

0~'  (A"'s<,+l))rA2A  (A_,i')-A*lz(/+n)- 
'  (A-' z)f  AZAA-' dz 

\<p-'  (A-|z,'+")rA2(z(0-r(i+,))- 
J(r'(A''z))  A2a!z 


(22) 


-2W  +  2 


~Q(x)+  2 


^(z('t,,)2(z,0-z<w>)- J;rr(z)ZJz 


where  7rr(z">)  =  0-'  (a-'z10)7’ A  • 

Since  I  is  a  triangular  matrix,  we  can  now  decouple 
the  transformed  input  vector  such  that 
V{i}(x,uli+i))  =  -Q(x)+ 


4° 


xT{z)Idz 

0 

ffr(z,;'+,>)(z]i,-zf'))- 

*v 

J  *r  (**)<&» 


(23) 


-2W+2 


A*  I 


Since  the  matrix  /?  is  positive  definite,  then  we  have 
the  singular  values  being  all  positive.  Also,  from 

the  geometrical  meaning  of 

nrr(r',+")(r‘n-z'i+")-  JV(z*)</z*  >  this  term  is 
0 

always  negative  if  xT(zk)  is  monotonically  non¬ 


decreasing.  But  since  xT  (z{i)y)  =  (/>"'  A  » it  is 

easy  to  show  that  0_l  (  j  should  be  monotonically 
non-decreasing,  and  thus  0(  )  itself  should  be 

monotonically  non-decreasing.  This  implies  that 
Yi,}(x,u{i+l])<>  0  and  that  VKi)(x)  is  a  Lyapunov 
function  for  u{nl)  on  Cl.  Following  Definition  2.1, 
u{M}  is  admissible  on  Cl. 


To  show  the  second  part  of  the  Lemma  3.1,  note  that 
the  performance  along  trajectories  Vx0 

is, 


yW)_yU)  _ 

JeWT^.u'^’))  +L(-l>(T,x0,u<'+l)f  dr  - 

1  (24) 

J/(x(r,  x0,  U(W>))+||«(fl(r,xb,  dr  = 

VrortiGHJEiy{M',ilM')  =  0,GHJRyi‘\u{i')  -  0 ,  we 

have 

3x  8x  *  (25) 


(26) 


/(*)-2  J(^"'(»))ryj  «/«, 

o 

9Q0r  Q)r 

ar  7  9*  ^ 

rt(M, 

/(x)-2  J  c/a. 

o 

Substituting  (25),  (26)  in  (24)  we  get 

P<W,(*.)-K(#)(*,)  = 


~2J 


(0',(«</+,)))r/?(Wf/+1)-«(°)- 

|  (tfF'(w))  /Wtf 


(27) 


dr 


By  decoupling  the  equation  (28)  using  7?=ASA  ,  it 
can  be  shown  that 


when  0(.)  is  monotonically  non-decreasing. 


(29) 


■ 

The  next  theorem  is  a  key  result  on  which  the  rest  of 
the  paper ^  is  justified.  It  shows  that  successive 
improvement  of  the  saturated  control  law  converges 
to  the  optimal  saturated  control  law  for  the  given 
actuator  saturation  model  *(.). 


Theorem  3.2:  Convergence  of  Successive 
Approximations 
If  utq,G  ¥(&) ,  then 

1.  y<l>  uniformly  on  Cl 

2.  i/^e^Q),  V/ >0 

3.  u{,]  — >  u 

Proof:  Since  GHJ%Y{0\ w(0,)  =  0  with  appropriate 
boundary  conditions.  From  Lemma  3.1,  we  have  that 
um  e  and  that  ytu  <  Fi0).  By  induction,  we 


have  that  F(i)  <VUA)  <Vm  and  „fr'6  4 >(q}.  We  can 
repeat  the  argument  used  in  proof  of  Lemma  3.1  to 
show  that  V' <tV*\  V/>(L  Thus,  y{,)  is  a 


monotonically  decreasing  sequence  that  is  bounded 
below.  Hence  V(i)  converges  to  some  It  is 

easily  then  seen  that 

GHJB[rm\u'-')aHJ£iy,-')x  0*  (30) 


Then  V{~]  -  V*  because  of  the  uniqueness  of 
solutions  of  the  HJB  equation  (Lewis  and  Syrmos, 
1995,  Lyshevski,  1996).  And  it  follows  that 

i/M  =u* . 

m 

The  following  is  a  result  from  (Beard,  1995)  which 
we  tailor  here  to  the  case  of  saturated  control  inputs. 
It  basically  guarantees  that  improving  the  control  law 
does  not  reduce  the  region  of  asymptotic  stability  of 
the  initial  saturated  control  law. 

Lemma  3.3:  Convergence  of  Stability  Regions 
The  saturated  optimal  control  u  is  asymptotically 
stable  on  every  stability  region  associated  with  every 

improved  control  law  w(,) . 

Proof:  Lemma  3.1  showed  that  the  saturated  control 
u0)  is  asymptotically  stable  on  where  'P(0)  is 

the  stability  region  of  the  saturated  control  w*0).  This 
implies  that  'F{0)  £  47*0 .  From  Theorem  3.2,  we 
know  that  ¥(M)  q  ^(/)  is  true.  By  induction,  this 
implies  that 

(31) 

■ 


Lemma  3.4:  Optimal  Saturated  Control  has  the 
Largest  Stability  Region 

The  saturated  control  u  has  a  stability  region  that  is 
the  largest  of  any  other  saturated  control  u{l)  that  is 


admissible  with  respect  to  g(*)  and  the  system 


(As)- 

Proof:  Since  any  admissible  saturated  control  can  be 
thought  of  as  «t0),  then  from  Lemma  3.3,  u  has  a 
stability  region  that  is  the  largest  of  any  other 
saturated  control  that  is  admissible  with  respect  to 
Q(x)  and  the  system  (/,  g) . 


Note  that  there  may  be  stabilizing  saturated 
controls  that  have  larger  stability  regions  than  u  ,  but 
are  not  admissible  with  respect  to  Q(x)  and  the 

system  (f,g) . 


3.2  Constrained  States 


in  literature,  there  exists  several  techniques  that  finds 
a  domain  of  initial  states  such  that  starting  within  this 
domain  guarantees  a  specific  control  policy  will  not 
violate  the  constraints,  (Gilbert  and  Tan,  1991). 
However,  we  are  interested  in  improving  given 
control  laws  so  that  they  do  not  violate  specific  state 
space  constraints.  For  this  we  choose  the  following 
nonquadratic  performance  functional. 


where  nrtBi%  are  the  number  of  constrained  states, 

the  upper  bound  on  xt  respectively.  The  integer  k  is 
an  even  number,  and  a,  is  a  small  positive  number. 
As  k  increases,  and  a,  -> 0»  the  nonquadratic  term 

will  dominate  the  quadratic  term  when  the  state  space 
constraints  are  violated.  However,  the  nonquadratic 
term  will  be  dominated  by  the  quadratic  term  when 
the  state  space  constraints  are  not  violated.  Note  that 
in  this  approach,  the  constraints  are  considered  soft 
constraints  that  can  be  hardened  by  using  higher 
values  for  k  and  smaller  values  for  at  • 


3.3  Minimum  Time  Problems 

For  systems  with  saturated  actuators, 'we  want  to  find 
the  control  signal  required  to  drive  the  system  to  the 
origin  in  minimum  time.  This  is  done  through  the 
following  performance  functional 

F  =  |  tanh(xrgjc)+2j(^l0u))r dv  (33) 
oL  0 

By  choosing  the  coefficients  of  the  weighting  matrix 
R  very  small,  and  for  /gx»0.  the  performance 
functional  becomes, 

K  =  Jl  dt,  .(34) 

0 

and  for  xTQx  ~  0>  the  performance  functional 
becomes, 

K  =  j  xTQx+l\(r\njf  Rdn  dt.  (35) 

t,  0 

Equation  (34)  represents  usually  performance 
functionals  used  in  minimum-time  optimization 
because  the  only  way  to  minimize  (34)  is  by 
minimizing  . 

Around  the  time  tx>  we  have  the  performance 

functional  slowly  switching  to  a  nonquadratic 
regulator  that  takes  into  account  the  actuator 
saturation. 

Note  that  this  method  allows  an  easy  formulation  of  a 
minimumtime  problem,  and  that  the  solution  will 
follow  using  successive  approximation  technique. 
The  solution  is  a  nearly  minimumtime  controller  that 
is  easier  to  find  compared  with  techniques  aimed  at 
finding  the  exact  minimum-time  controller.  Finding 
an  exact  minimum-time  controller  requires  finding  a 
bang-bang  controller  based  on  a  switching  surface 
that  is  hard  to  determine  (Lewis  and  Syrmos,  1995; 
Kirk,  1970). 

Having  the  successive  approximation  theory  well  set 
for  nonquadratic  functionals ,  in  the  next  section  we 
will  introduce  a  neural  network  approximation  of  the 
value  function,  and  employ  the  successive  solutions 
method  in  a  least-squares  sense  over  a  compact  set  of 
the  stability  region,  Q.  This  is  far  simpler  than  the 
Galerkin  approximation  appearing  in  (Beard,  1995). 


4.  NEURAL  NETWORK  LEAST-SQUARES 
APPROXIMATE  HJB  SOLUTION 

Although  equation  (13)  is  a  linear  differential 
equation,  when  substituting  (14)  into  (13),  it  is  still 
difficult  to  solve  for  the  cost  function  y{i)(x)- 

Therefore,  neural  networks  are  now  used  to 
approximate  the  solution  for  the  cost  function  y{l)(x) 
at  each  successive  iteration  / .  Moreover,  to 
approximate  integration,  a  mesh  is  introduced  in  91" . 
This  yields  an  efficient,  practical,  and 
computationally  tractable  solution  algorithm  for 
general  nonlinear  systems  with  states  and  controls 
constraints. 


4.1  Neural  Network  Approximation  of  the  Cost 
Function  y(x) 


It  is  well  known  that  neural  networks  can  be  used  to 
approximate  smooth  functions  on  prescribed  compact 
sets  (Lewis,  et  al.  1999).  Since  our  analysis  is 
restricted  to  a  stability  region,  which  is  a  compact  set, 
neural  networks  are  natural  for  our  application, 
Therefore,  to  successively  solve  (13),  (14)  for 
constrained  control  systems,  we  approximate  V{\x) 
with  a  neural  network 

Ki°(x)  =2X,cr/W  =  WlinoL  (x),  (36) 

where  the  activation  functions  cry  (*)  ,  are 

continuous,  <t,.(0)=0.  span  {CT.}“  c  l2(Q)-  The 
neural  network  weights  are  w.  and  L  is  the  number 
of  hidden-layer  neurons.  Vectors 

aL(x)s[a,(x)  a2(x) -  aL (x)f .  WL  =  [w,  w2 ...  wj 
are  the  vector  activation  function  and  the  vector 
weight  respectively.  The  neural  network  weights  will 
be  tuned  to  minimize  the  residual  error  in  a  least- 
squares  sense  over  a  set  of  points  within  the  stability 
region  of  the  initial  stabilizing  control.  Least-squares 
solution  attains  the  lowest  possible  residual  error  with 
respect  to  the  neural  network  weights. 


For  the  GHJB{Vyu)  =  0,  the  solution  V  is  replaced 
with  yL  having  a  residual  error 


GHJB 


I 


wfcrj(x),U 


=  eL(x)’ 


(37) 


To  find  the  least-squares  solution,  the  method  of 
weighted  residuals  is  used  (Finlayson,  1972).  The 
weights  Wj  are  determined  by  projecting  the  residual 


error  onto  deL(x)  ancj  seeing  the  result  to  zero 
d  wL 


Vxe  Q ,  i.e. 


deL(x) 
d  wL 


,eL(x))  =  0. 


(38) 


When. expanded,  equation  (38)  becomes, 


(vct  (/  +H (/+ gu))  wL  + 
(q+2^(<P'\u))  Aciu,V<xL(f+  gu)j=0. 
Expanding  the  derivative  of  the  residual, 

+s«),^-{f+gu)^jwL+ 

(Q+2\{f\uj)T  Rdu,^(f+gu) 

The  following  technical  results  are  needed. 

Lemma  4.1 :  if  the  set  \q  is  linearly  independent 
1  ./Jt 

and  u  e  ¥(£)) » then  the  set 


is  also  linearly  independent. 
Proof:  See  [3]. 


From  Lemma  4.1,  equation  (40)  can  be  rewritten, 
after  defining  |^L(y  +  gu)J  £  [6j  .  as 

(Vai(/  +  gI/),0y)wl+  (42) 

(Q+2j(r\u))T Rdi^d^O,  j  =  {,•••, L. 
Because  of  Lemma  4.1,  the  term  (VaL  (f  +  gu),6  ) 

is  of  full  rank,  and  thus  is  invertible.  Therefore  a 
unique  solution  for  w  exists.  We  can  solve  equation 
(42)  for  wL  as  follows, 


=-^VCTt(/+gn)>ey)''. 

(Q  +  2f(<F'(u)f  Rdu^,  j  =  L. 


4.2  Introducing  a  Mesh  in  9T 

Solving  the  integration  in  (43)  is  expensive 
computationally.  However,  the  integrations  can  be 
approximated  to  a  suitable  degree  using  the  Riemann 
definition  of  integration.  This  results  in  a  nearly 
optimal,  computationally  tractable  solution 
algorithm 

Lemma  4,2:  Riemann  Approximation  of  Integrals 
An  integral  can  be  approximated  as 

£/(*). Ax,  (44) 

u  1  1 

where  Ax  =  -  x^  and  /  is  bounded  on  [a%b]  > 

(Burk,  1998). 


Introducing  a  mesh  on  Q,  with  mesh  size  equal  to 
Ax  ,  we  can  rewrite  some  terms  of  (43  )  as  follows: 


X=  VcL  (/  +  gu)^  •  •  ■  Vcrt  (/  +  gii  )j^  j,  (45) 

I  T 

Q  +  2j(<p-'(u))T  Rdu\ 

Y=  :  "  W 

Q+2j(<p-'(u))T  Rdu\ 

U  - 

where  p  in  Xp  represents  the  number  of  points  of  the 

mesh.  This  number  increases  as  the  mesh  size  is 
reduced. 

Using  Lemma  4.2,  we  have 

(v<M/+s»M)=| ip,  (*'4  a*.  (47) 

Rdu,0^j^(x’r)^ 

This  implies  that  we  can  calculate  Wi  as 

wl=-(XtX)~'[XtY)  (48) 

An  interesting  observation  is  that  equation  (48)  is  the 
standard  least-squares  method  of  estimation  for  a 
mesh  on  Q.  Note  that  the  mesh  size  A  should  be 
such  that  the  number  of  points  p  is  greater  than  or 

equal  to  the  order  of  approximation  L .  This 
guarantees  a  full  rank  for  ( xT X )  * 


5.  NUMERICAL  EXAMPLES 

We  now  show  the  power  of  our  neural  network 
control  technique  of  finding  nearly  optimal  nonlinear 
controllers  for  nonlinear  systems.  Two  examples  are 
presented. 

5. 1  Nonlinear  oscillator  with  actuator  saturation 

We  consider  next  a  nonlinear  oscillator  having  the 
dynamics 

X,  =  X,  +  A,  -  x,(x?  +  4), 
x2  =  -*,  +X2  -*2(*,2  +  M?)+  U. 


to  the  6th  order  powers  are  used,  convergence  of  the 
iteration  to  admissible  controls  was  not  observed. 

The  state  feedback  control  u  =  sat+l  (-5-y  -3^ ,)  is 

used  as  an  initial  stabilizing  control  for  the  iteration. 
This  is  found  after  linearizing  the  nonlinear  system 
around  the  origin,  and  building  an  unconstrained  state 
feedback  control  which  makes  the  eigenvalues  of  the 
linear  system  all  negative.  Figure  1  shows  the 
performance  of  the  bounded  controller 
it  =  jffl/*,1  ,)  *  Note  that  it  is  not  good. 

The  nearly  optimal  saturated  control  law  is  now 
found  through  the  technique  presented  in  this  paper. 


Slats  Trajectoty  (or  Initial  Stabilizing  Control 


m  » 

Tim»{s) 


Fig.  1.  Performance  of  the  initial  stabilizing  saturated 
control. 


It  is  desired  to  control  the  system  with  control  limits 
of  |u|<l.  The  following  smooth  function  is  used  to 

approximate  the  value  function  of  the  system, 

Vl\ Cv I  >*2  )  =  W\X]  +  ^2*2  +  VV3 X, x2  +  VVjXj*  +  + 

wGx]x2  +  w7x*x\  +  >v8x,x23  +  wtr xf  +  wlox2 

Wl\XlX2  +  W\2X*X1  +  VV|^^2  +  W\4X\X2  +  W\5X  rV2  + 

+  W17jt$  +  wxtx]x2  +wlvxjxj  +  w2lrx]x  2 
+wux*\x2  +w22ri'T!  “xj  +wlrx  ,x2 

This  neural  network  has  24  activation  functions 
containing  powers  of  the  state  variable  of  the  system 
up  to  the  8th  power.  The  complexity  of  the  neural 
network  is  selected  to  guarantee  convergence  of  the 
algorithm  to  an  admissible  control  law.  When  only  up 


The  algorithm  is  run  over  the  region 
-I<x,  <1,  -l£x2<  1,  with  a  mesh  size  0.025,  and 
R  =  I,  Q=12j2 •  After  20  successive  iterations,  the 
nearly  optimal  saturated  control  law  is  found  to  be, 
r2.62xl  +4.23x,  +  0.39*23 -4.0*,3 -8.65a,3*, ' 
-8.94a;  4  ~ 5-53*2  +2.26x1  +  5.78*;*,  + 
u  =  - tanh  1 1 .00*3 *>  +2.5144  +  2-00*,  x2  +  2.08*,3 
-0.49*,7  - 1 .65*;*,  -  2.7  l*f *;  -  2. 1 9xt\x\ 
-0.76*,3*,4  +l.77*r*;  +  0.87*,  *,6 

This  is  the  control  law  in  terms  of  a  neural  network. 
Note  that  the  controller  in  figure  2  outperforms  the 
initial  stabilizing  controller  shown  in  figure  1 . 


Slate  Trajectory  for  the  Nearly  Optima]  Control  Law 


Fig.  2.  Nearly  optimal  nonlinear  control  law 
with  actuator  saturation 


5.2  Constrained  state  linear  system 


from  the  region  -3.5< £3.5., -5 <*,  <5,  and 

running  the  successive  approximation  algorithm  for 
20  times. 


Fig.  3.  LQR  control  without  considering  the  state 
constraint. 


Consider  the  following  system 

=*2, 

*2  =JC,  +X2  +  11 

'W«. 

For  this  we  select  the  following  performance 
functional 

Q(x,  14)  =  Xt2+Xj2+^y 
fV(u)  =  u1. 

Note  that,  we  have  chosen  the  coefficient  k  to  be  10, 
and  £,  =3,  and  a{=\>  A  reason  why  we  have 

selected  k  to  be  10  is  that  a  larger  value  for  k 
requires  using  many  activation  functions  in  which  a 
large  number  of  them  will  have  to  have  powers 
higher  than  the  value  k .  However,  since  this 
simulation  was  carried  on  a  double  precision 
computer,  then  power  terms  higher  than  14  do  not 
add  up  nicely  and  round-off  errors  seriously  affect 
determining  the  weights  of  the  neural  network  by 
causing  a  rank  deficiency. 

An  initial  stabilizing  controller,  the  LQR 

-3.6xj »  that  violates  the  state  constraints  is 

shown  in  figure  3.  The  performance  of  this  controller 
is  improved  by  stochastically  sampling  3000  times 


It  can  be  seen  that  the  nearly  optimal  control  law  that 
considers  the  state  constraint  tends  not  to  violate  the 
state  constraint  as  the  LQR  controller  does.  It  is 
important  to  realize,  that  as  we  increase  the  order  k 
in  the  performance  functional,  then  we  get  larger  and 
larger  control  signals  at  the  starting  time  of  the 
control  process  to  avoid  violating  the  state 
constraints. 

A  smooth  function  of  the  order  45  that  resembles  the 
one  used  in  example  5.1  is  used  to  approximate  the 
value  function  of  the  system.  The  weights  Wu  are 


found  by  successive  approximation.  Since  i,  the 
final  control  law  becomes. 


1  rdV 

U(X)-1W:W 

It  was  noted  that  the  nonquadratic  performance 
functional  returns  an  over  all  cost  of  212.33  when 
the  initial  conditions  are  ^  =  2.4,x,  =5.0  for  the 

optimal  controller,  while  this  cost  increases  to  316.07 
when  the  linear  controller  is  used.  It  is  this  increase  in 
cost  detected  by  the  nonquadratic  performance 
functional  that  causes  the  system  to  avoid  violating 
the  state  constraints.  If  this  difference  in  costs  is 
made  bigger,  then  we  actually  increase  the  set  of 
initial  conditions  that  do  not  violate  the  constraint. 
This  however,  requires  a  larger  neural  network,  and 
high  precision  computing  machines. 


Fig.  4.  Nearly  optimal  nonlinear  control  law 
considering 


5.3  Minimum  time  control 


Consider  the  following  system 
* 

*2  =  -*,  +u. 


It  is  desired  to  control  the  system  with  control  limits 
H<1  t0  drive  it  to  origin  in  minimum  time. 

Typically,  from  classical  optimal  control  theory, 
(Kirk  1970),  we  find  out  that  the  control  law  required 
is  a  bang-bang  controller  that  switches  back  and  forth 
based  on  a  switching  surface  that  is  calculated  using 
Pontryagin’s  minimum  principle.  It  follows  that  the 
minimum  time  control  law  for  this  system  is  given  by 


i'w=*,nfi,n(i*2i+|)+*2’ 


-I,  for*  such  thatj(*)>0, 


u(x)  = 


+  1,  for*  such  that  j(*)<0, 

-1,  for*  such  that  j»(*)=0  and*2<0, 
0,  for*=0. 


The  response  to  this  controller  is  shown  in  figure  5.  It 
can  be  seen  that  this  is  a  highly  nonlinear  control  law, 
that  requires  the  calculation  of  a  switching  surface. 
This  is  however  a  formidable  task  even  for  linear 
systems  with  state  dimension  larger  than  3.  However, 
when  using  the  method  presented  in  this  paper, 


finding  a  nearly  minimum-time  controller  becomes  a 
less  complicated  matter. 


We  use  the  following  nonquadratic  performance 
functional, 


Q(x )  =  tanh 


0.1 


fV(u)  =  0.00 1  *2 J  tanirVVM- 
0 

A  smooth  function  of  the  order  35  is  used  to 
approximate  the  value  function  of  the  system.  We 
solve  for  his  network  by  stochastic  sampling.  By 
sampling  5000  times  from  the  region 
-0.5<  *,<  0.5, -0.5  <*3  £0.5-  The  weights  Wtt  are 
found  by  successive  approximation,  for  20  times. 
Since  R  =  | ,  the  final  control  law  becomes, 


Minimum  lima  Cont/ol  using  Ponlryagm  minimum  principle 


4-J-* - 1_ 


Tkne(t) 


H - - h 


Fig.  5.  Performance  of  the  exact  minimum-time 
controller. 


Figure  6  shows  the  performance  of  the  controller 
obtained  using  the  algorithm  presented  in  this  paper. 
To  show  how  close  the  performance  of  this  controller 
to  the  exact  minimum-time  controller.  Figure  7  plots 
the  state  trajectory  of  both  controllers.  Note  that  the 
nearly  minimum-time  controller  behaves  as  a  bang- 
bang  controller  until  the  states  come  close  to  the 
origin  when  it  starts  behaving  as  a  regulator. 


Tlme(») 


Fig.  6.  Performance  of  the  nearly  minimum-time 
controller. 


State  Evolution  for  both  conttolllers 


Fig.  7.  State  evolution  for  both  minimumtime 
Controllers. 


6.  CONCLUSION 

A  rigorous  computationally  effective  algorithm  to 
find  nearly  optimal  control  laws  under  various 
constraints  and  requirements  is  shown.  The 
successive  approximation  theory  has  been  extended 
to  include  nonquadratic  performance  functionals.  The 
result  is  a  set  of  control  laws  that  are  in  state 
feedback  form  for  general  nonlinear  systems.  The 
control  is  given  as  the  output  of  a  neural  network. 
This  is  an  extension  of  the  novel  work  by  Lyshevski 
(2001)  and  Beard  (1995).  Three  numerical  examples 
were  used  to  demonstrate  the  effectiveness  of  this 
technique. 
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